137-04 


A Bayesian Approach to Uncertainty Modelling 

in OWL Ontology 


1 


Zhongli DING, Yun PENG, Rong PAN 


Abstract— Dealing with uncertainty is crucial in ontology 
engineering tasks such as domain modelling, ontology reasoning, 
and concept mapping between ontologies. This paper presents our 
on-going research on modelling uncertainty in ontologies based 
on Bayesian networks (BN). This includes 1) extending OWL to 
allow additional probabilistic markups for attaching probability 
information, 2) directly converting a probabilistically annotated 
OWL ontology into a BN structure by a set of structural 
translation rules, and 3) constructing the conditional probability 
tables (CPTs) of this BN using a new method based on iterative 
proportiobal fitting procedure (IPFP). The translated BN can 
support more accurate ontology reasoning under uncertainty as 
Bayesian inferences. 

Index Terms— Bayesian Networks, IPFP, Ontology, Semantic 
Web, Uncertainty. 

I. Introduction and Motivation 

N the semantic web [17], an important component of an 
ontology defined in OWL [18] or RDF(S) [19] is the 
taxonomical concept subsumption hierarchy based on class 
axioms (defined by rdfs:subClassOf, owhequivalentClass, and 
owhdisjointWith) and logical relations among the concept 
classes (defined by owhunionOf, owhintersectionOf, and 
owlxomplementOf). Such an ontology taxonomy definition is 
based on crisp logic and thus cannot quantify the degree of the 
overlap or inclusion between two concepts, cannot support 
reasoning in how close a description D is to its most specific 
subsumer and most general subsumee, and tends to over¬ 
generalize with noisy input [2], Uncertainty becomes more 
prevalent in web environment when more than one ontology 
are involved where it is often the case that a concept defined in 
one ontology can only find partial matches to one or more 
concepts in another ontology. 

To model uncertainty in ontology representation, reasoning 
and mapping, this paper presents a new probabilistic extension 
to OWL ontology taxonomy based on Bayesian networks (BN) 
[1], a widely used graphic model of dependencies among 
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variables. In our approach, OWL is first augmented to allow 
additional probabilistic markups so that probability values can 
be attached to individual concepts in an ontology. Secondly, a 
set of structural translation rules is defined to convert this 
probabilistically annotated OWL ontology taxonomy into a 
directed acyclic graph (DAG) of a BN. Finally, the BN is 
completed by constructing conditional probability tables 
(CPTs) for each node in the DAG. 

To help understand our approach, in the remaining of this 
section, we give a simple review of OWL [18] and BN [1], 

A. Web Ontology Language (OWL) 

An OWL document can include an optional ontology header 
and any number of classes, properties, axioms, and individual 
descriptions. In an ontology defined by OWL, a named class is 
described by a class identifier. An anonymous class can be 
described by some value (owkhasValue, owkallValuesFrom, 
owl:someValuesFrom) or cardinality (owkcardinality, owh- 
maxCardinality, owkminCardinality) restriction on property 
(owkRestriction); by exhaustively enumerating all the 
individuals that form the instances of this class (owkoneOf); or 
by logical operation on two or more classes (owhunionOf, 
owhintersectionOf, owlxomplementOf). Three class axioms 
(rdfs:subClassOf, owhequivalentClass, owhdisjointWith) can 
be used for defining necessary and sufficient conditions of a 
class. Two kinds of properties can be defined: object property 
(owkObjectProperty) which links individuals to individuals, 
and datatype property (owkDatatypeProperty) which links 
individuals to data values, “rdfs: sub Property Of’ is used to 
define that one property is a subproperty of another property. 
Besides these most commonly used constructors, there are 
some other constructors (e.g., owkequivalentProperty and 
owkinverseOf to relate two properties; owkFunctionalProperty 
and owklnverseFunctionalProperty to impose cardinality 
restrictions on properties; etc.) 

The semantics of OWL is defined based on model theory in 
the way analogous to the semantics of description logic (DL). 
With a set of vocabulary (mostly as described above), one can 
define an ontology as a set of (restricted) RDF triples which 
can be represented as a RDF graph. 

B. Bayesian Network 

In the most general form, a BN of n variables consists of a 
DAG of n nodes and a number of arcs. Nodes A, in a DAG 
correspond to random variables, and directed arcs between two 
nodes represent direct causal or influential relations from one 
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variable to the other. The uncertainty of the causal relationship 
is represented locally by the CPT P(X i I ,t, ) associated with 
each node X i , where n i is the parent set of X t . Under a 
conditional independence assumption, the joint probability 
distribution of X = (2f l5 X n ) can be factored out as a 
product of the CPTs in the network (named “the chain rule of 
BN”): P(X = x ) = njLi P(Xi I 7i{). With the joint probability 
distribution, BN supports, at least in theory, any probabilistic 
inference in the joint space. 

Besides the power of probabilistic reasoning provided by 
BN itself, we are attracted to BN in this work also by the 
structural similarity between the DAG of a BN and the RDF 
graph of OWL ontology: both of them are directed graphs, and 
direct correspondence exists between many nodes and arcs in 
the two graphs. In this work, we only consider ontology 
taxonomy which uses only constructors for the terminology 
part of DL. Constructors related to properties, individuals, and 
datatypes will be considered in the future. 

The rest of this paper is organized as follows: Section II 
extends OWL for encoding probabilities into ontology; 
Section III presents a set of rules that are used to translate 
OWL ontology into DAG of BN; Section IV develops a 
method to construct CPTs for each node in the DAG; Section 
V briefly discusses how ontology reasoning may be performed 
over this translated BN. The paper concludes in Section VI 
with discussions of related work and directions for future 
research. 

II. Encoding Probabilities in Ontology 

The model-theoretic semantics of OWL [18] treats the 
domain as a non-empty collection of individuals. If classes A 
and B represent two concepts, we treat them as random 
binary variables and interpret P(A = a) as the prior 
probability or one’s belief that an arbitrary individual belongs 
to class A , and P(a I b) as the conditional probability that an 
individual of class B also belongs to class A . Similarly, we 
can interpret P(a ), P(a I b ), P(a I b ), and P(a I b ) with the 
negation interpreted as “not belonging to”. These two types of 
probabilities (prior or conditional) correspond naturally to 
classes and relations in an ontology, and are most likely to be 
available to ontology designers. Currently, our translation 
framework can encode two types of probabilistic information 
into the original ontology: for a concept class C and its parent 
superconcept class set n c 

(1) Prior or marginal probability P(C)\ 

(2) Conditional probability P(C\O c ) where O c c n c , 
71 c ^ 0 , O c ^ 0 . 

To add such uncertainty information into an existing 
ontology, we treat a probability as a kind of resource, and 
define two OWL classes: “PriorProb”, “CondProb”. A 
probability with the form P(C ) is defined as an instance of 
class “PriorProb”, which has two mandatory properties: 


“hasVarible” and “hasProbValue”. A probability with the form 
P(C I O c ) is defined as an instance of class “CondProb” with 
three mandatory properties: “hasCondition” (at least one), 
“hasVariable”, and “hasProbValue”. The range of properties 
“hasCondition” and “hasVariable” is a defined class named 
“Variable”, which has two mandatory properties: “hasClass” 
and “hasState”. “hasClass” points to the concept class this 
probability is about and “hasState” gives the “True” (belong 
to) or “False” (not belong to) state of this probability. 

For example, P(c) = 0.8 , the prior probability that an 
arbitrary individual belongs to class C , can be expressed as 
follows: 

cVariable rdf:ID="c"> 

<hasClass>C</hasClass> 

<hasState>True</hasState> 

</V ariable> 

<PriorProb rdf:ID="P(c)"> 

<has V ariable>c</has V ariable> 

<hasProb V alue>0.8</hasProb V alue> 

</PriorProb> 

and P(c I p\,p2,p3) = 0.8 , the conditional probability that an 
individual of the intersection class of PI, P 2 , and P 3 also 
belongs to class C , can be expressed as follows: 

<Variable rdf:ID="c"> 

<hasClass>C</hasClass> 

<hasState>True</hasState> 

</V ariable> 

cVariable rdf:ID="pl"> 

<hasClass>P 1 </hasClass> 

<hasState>True</hasState> 

</Variable> 

cVariable rdf:ID="p2"> 
chasClass>P2c/hasClass> 
chasState>Truec/hasState> 
c/Variable> 

cVariable rdf:ID="p3"> 
chasClass>P3c/hasClass> 
chasState>Truec/hasState> 
c/Variable> 

cCondProb rdf:ID="P(clpl, p2, p3)"> 
chasCondition>p 1 c/hasCondition> 
chasCondition>p2c/hasCondition> 
chasCondition>p3c/hasCondition> 
chas V ariable>cc/has V ariable> 
chasProb V alue>0.8c/hasProb V alue> 
c/CondProb> 

For simplicity we did not consider the namespaces in above 
examples. For a complete definition of probabilistic markups, 
please refer to: http://www.csee.umbc.edu/--zdingl/owl/prob.owl . 

III. Structural Translation 

The ontology augmented with probability values as 
described in Section II will still be an OWL file. It can be 
translated into a BN by first forming a DAG following a set of 
rules. The general principle underlying these rules is that all 
classes (specified as “subjects” and “objects” in RDF triples of 
the OWL file) are translated into nodes in BN, and an arc is 
drawn between two nodes in BN if the corresponding two 
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classes are related by a “predicate” in the OWL file, with the 
direction from the superclass to the subclass if it can be 
determined. Control nodes are created during the translation to 
facilitate modelling relations among class nodes that are 
related by OWL logical operator. These structural translation 
rules are summarized as follows: 

(1) Every primitive or defined concept class C , is mapped 
into a two-state (either “True” or “False”) variable node in the 
translated BN, C is in “True” state when an instance c 
belongs to it; 

(2) There is a directed arc from a parent superclass node to 
a subclass node, for example, a concept class C defined with 
superconcept classes C, (i = 1, n ) by “rdfs:subClassOf’ is 
mapped into a subnet in the translated BN with one converging 
connection (Fig.l) from each C, to C ; 



(3) A concept class C defined by set intersection operation 
(owhintersectionOf) of concept classes C i ( i = 1,..., n ) is 
mapped into a subnet (Fig.2) in the translated BN with one 
converging connection from each C, to C , and one 
converging connection from C and each C, to a control node 
called “Bridge_Intersection”; 



Fig.2. - “owhintersectionOf" 

(4) A concept class C defined by set union operation 
(owhunionOf) of concept classes C, (i = 1,..., n ) is mapped 
into a subnet (Fig.3) in the translated BN with one converging 
connection from C to each C, , and one converging 
connection from C and each C, to a control node called 
“Bridge_Union”; 




Fig.4. - “owhcomplementOf, owhequivalentClass, owhdisjointWith” 


(5) If two concept classes Cj and C, are related by 
complement (owhcomplementOf), equivalent (owhequivalent¬ 
Class), or disjoint (owhdisjointWith) relation, then a control 
node (named “Bridge_Complement”, “Bridge_Equivalent”, 
“Bridge_Disjoint” respectively, as in Fig.4) is added to the 
translated BN, and there are directed links from Cj and C 2 to 
this node. 

Based on rule (1) to (5), the translated BN contains two 
kinds of nodes: regular nodes for concept classes and control 
nodes which bridging nodes that are associated by logical 
relations. The CPT of a control node will be set in a way such 
that when the state of this control node is set to “True”, the 
corresponding logical relation among its parent concept class 
nodes will be held (see Subsection IV.A for more details). By 
using control nodes, the logical relations are separated from 
the “rdfs:subClassOf ’ relation, so the in-arcs to a regular node 
C will only come from its parent superclass nodes, which 
makes C ’s CPT smaller and easier to construct, compared to 
our old method in [2], In the translated BN, all the arcs are 
directed based on OWL statements, two concept class nodes 
without any defined or derived relations are d-separated with 
each other, and two implicitly dependent concept class nodes 
are d-connected with each other but there is no arc between 
them. 

IV. Constructing CPTs 

Once we had the network structured, the last step to 
complete the translation is to assign a conditional probability 
table (CPT) P(C\k c ) to each variable node C in the 
structure, where x c is the set of all parent nodes of C . From 
structural translation we know that all nodes X in the 
translated BN can be partitioned into two subsets: regular 
nodes X R which denote concept classes, and control nodes 
X c for bridging nodes that are associated by logical relations. 
For a regular node C e X R , as described in Section II, we 
have prior probability P(C) attached to it if it does not have 
any parent nodes; or conditional probability P(C\O c ) 
attached to it if its parent set ji c ^ 0 and O c ci n c . Details 
about how to construct CPTs for regular nodes in X R based 
on attached probabilistic information in the probabilistically 
annotated ontology will be given later in Subsection C. Here 
we deal with CPTs for the control nodes in X c first. 

A. CPTs for Control Nodes 

Based on the structural translation rules, there are five types 
of control nodes corresponding to the five logic operators in 


Fig.3. - “owhunionOf’ 
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OWL. They are “Bridge_Complement”, “Bridge_Disjoint”, 
“Bridge_Equivalent”, “Bridge_Intersection”, “Bridge_Union”. 
Their CPTs are determined by the logical relation among its 
parent concept class nodes, which are specified next. 

(1) Bridge_Complement (Table 1): When its state is set to 
“True”, Cj and C 2 are complement of each other; 


Table 1 - CPT of Bridge_Complement 


Cl 

C2 

Tme 

False 

True 

True 

0.000 

100.00 

True 

False 

100.00 

0.000 

False 

True 

100.00 

0.000 

False 

False 

0.000 

100.00 


(2) Bridge_Disjoint (Table 2): When its state is set to 
“True”, Cj and C 2 are disjoint with each other; 


Table 2 - CPT of Bridge_Disjoint 


Cl 

a 

True 

False 

True 

True 

0.000 

100.00 

True 

False 

100.00 

0.000 

False 

True 

100.00 

0.000 

False 

False 

100.00 

0.000 


(3) Bridge_Equivalent (Table 3): When its state is set to 
“True”, C l and C 2 are equivalent with each other; 


Table 3 - CPT of Bridge_Equivalent 


Cl 

C2 

True 

False 

True 

True 

100.00 

0.000 

True 

False 

0.000 

100.00 

False 

True 

0.000 

100.00 

False 

False 

100.00 

0.000 


(4) Bridge_Intersection (Table 4): When its state is set to 
“True”, C is the intersection of Cj and C 2 ; 


Table 4 - Bridge_Intersection 


Cl 

C2 

C 

True 

False 

True 

True 

True 

100.00 

0.000 

True 

True 

False 

0.000 

100.00 

True 

False 

True 

0.000 

100.00 

True 

False 

False 

100.00 

0.000 

False 

True 

True 

0.000 

100.00 

False 

True 

False 

100.00 

0.000 

False 

False 

True 

0.000 

100.00 

False 

False 

False 

100.00 

0.000 


In a more general case, if a concept class C is the intersection 
of n >2 concept classes then the 2" +1 entries in the CPT of 
“Bridge_Intersection” can be obtained analogously. 

(5) Bridge_Union (Table 5): When its state is set to “True”, 
C is the union of Cj and C 2 ; 


n > 2 concept classes then the 2" +1 entries in the CPT of 
“Bridge_Union” can be obtained analogously. 

When the CPTs for control nodes are properly determined 
as above, if we set the states of all the control nodes to “True”, 
the logical relations defined in the original ontology will be 
held in the translated BN, which is thus consistent with the 
OWL semantics. We denote this situation that all the control 
nodes in the translated BN are in “True” state as CT . 

The remaining issue is to construct CPTs for the regular 
nodes in X R so that P(X R I CT) , the joint probability 
distribution of all regular nodes in the subspace of CT , is 
consistent with all the given prior and conditional probabilities 
attached to the nodes in X R . This issue is difficult because 1) 
the product of CPTs of all variables gives the joint distribution 
in the general space, not the subspace of CT (the 
dependencies changes when going from the general space to 
the subspace of CT ); and 2) the probabilistic information 
encoded is in the form of prior probability ( P(C) ) and 
conditional probability ( P(C I O c ) , n c * 0 , O c C7t c ), not 
directly in the form of CPT ( C may have other parent nodes 
in addition to O c ). 

To address these issues, we developed an algorithm to 
approximate these CPTs for X R based on the “iterative 
proportional fitting procedure” (IPFP) [3]-[8], a well-known 
mathematical procedure that modifies a given distribution to 
meet a set of constraints while minimizing I-divergence 
(Kullback-Leibler distance) to the original distribution. 

B. Brief Introduction to IPFP 

In this subsection we give a brief introduction to the 
iterative proportional fitting procedure (IPFP), which was first 
published by [3] in 1937, and in [4] it was proposed as a 
procedure to estimate cell frequencies in contingency tables 
under some marginal constraints. In 1975, I. Csiszar [5] 
provided an IPFP convergence proof based on I-divergence 
geometry. J. Vomlel rewrote a discrete version of this proof in 
his PhD thesis [6] in 1999. IPFP was extended in [7], [8] as 
conditional iterative proportional fitting procedure (CIPF-P) to 
also take conditional distributions as constraints, and the 
convergence was established for the finite discrete case. 

We give definitions of I-divergence and I-projection first 
before going into the details of IPFP. In our context, all 
random variables are finite and all probability distributions are 
discrete. 


Table 5 - Bridge_Union 


Cl 

C2 

C 

True 

False 

True 

True 

True 

100.00 

0.000 

True 

True 

False 

0.000 

100.00 

True 

False 

True 

100.00 

0.000 

True 

False 

False 

0.000 

100.00 

False 

True 

True 

100.00 

0.000 

False 

True 

False 

0.000 

100.00 

False 

False 

True 

0.000 

100.00 

False 

False 

False 

100.00 

0.000 


Definition 3.1 ( I-divergence ) 

Let P be a set of probability distributions, and for P,Q e P , 


I-divergence (also known as Kullback-Leibler divergence or 
Cross-entropy , which is often used as a distance measure 
between two probability distributions) is defined as: 


I(P 110 = 


I m-)iog 

x<eX,P(x)>0 


P(x) 

Q(X) 


+ 00 


ifP«Q 

ifPt<Q 


( 1 ) 


In a more general case, if a concept class C is the union of 
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Here P «Q means P is dominated by Q , i.e. 

{xgX\ P(x) > 0 } c {y g X I Q(y) > 0 } 

where x (or y ) is an assignment of X , or equivalently: 

{yeII(2(y) = 0}c{x6XIP(x) = 0} 

since a probability value is always non-negative. The 
dominance condition in (1) guarantees division by zero will 
not occur because whenever the denominator Q(x) is zero, the 
numerator P(x) will be zero. Note that I-divergence is zero if 
and only if P and Q are identical and I-divergence is non- 
symmetric. 

Definition 3.2 ( I-projection ) 

The Ii-projection of a probability distribution Q g P on a set 
of probability distributions £ is a unique probability 
distribution P ge such that the I-divergence “ I(P II Q) ” is 
minimal among all probability distributions in a . Similarly, 
the L-projections of Q on £ are probability distributions in 
E that minimize the I-divergence “ I(Q II P) ” and 12- 
projection is not generally unique. 

If £ is a given set of probability distributions that satisfies 
all given constraints, the Ii-projection P ge of Q is a 
distribution that has the minimum distance from Q among all 
those in £ [6], 


Definition 3.3 (IPFP) 

Let X ={X 1 ,X 2 ,...,X n ] be a space of discrete random 
variables, given a consistent set of m marginal probability 
distributions {f?(S, )j where X 3 S t ± 0 and an initial 
probability distribution Q (0) e P , iterative proportional fitting 
procedure (IPFP) is a procedure for determining a joint 
distribution P(X) = P{X { , X 2 ,..., X n ) « Q {0) satisfying all 

constraints in [RiSf] by repeating the following compu¬ 
tational process over k and i = ((k - 1) mod tri) +1: 


(&)(*) = 


0 


Q(k- 1)(^0' 


R(S t ) 


ifQ (k _ 1) (S i ) = 0 

if Qt k -t)(S,)> 0 


( 2 ) 


Q(k- 1) ($i ) 

This process iterates over distributions in {^(5,)} in cycle. 


It can be shown [6] that in each step k , Q (k) {X) is an L- 


projection of Q (k _ X) (X) that satisfies the constraint RiSf) , 

* 

and Q =lim k ^ >c0 Q {k ) is an Ii-projection of Q l()) satisfying all 


constraints, i.e., Q converges to P(X) = P(X l ,X 2 ,...,X n ). 

CIPF-P from [7], [8] is an extension of IPFP to allow 
constraints with the form of conditional probability 
distributions, i.e. R(Sj I Lj) where L ; cX . The procedure 
can be written as: 


<2tt)W = 


o 

Q(k-i) 


(X)- 


RiSjlLt) 


ifQ k _ t) (S,\L l ) = 0 
if Q( k -i)(Sj IL/) >0 


CIPF-P has similar convergence result [8] as IPFP and (2) is 
in fact a special case of (3) with L,- = 0 . 


C. Constructing CPT for Regular Nodes 

Let X = {X l ,...,X n } be the set of binary (i.e. X i s{x i ,x i }) 
variables in the translated BN, X R the set of regular nodes, 
and X c the set of control nodes, as stated earlier in this 
section. The remaining issue is to construct CPTs Q(V ! I n v j) 
for the regular nodes V. in X R so that Q(X R I CT), the joint 
probability distribution of X R in the subspace of CT , is 
consistent with all the given prior and conditional 
probabilities. Again, we restrict the encoded probabilities to 
the two forms: (1) prior or marginal probability P(C) and (2) 
conditional probability P(C I O c ) where O c c n c , n c ^ 0 , 
O c 0 , and each is attached to a node in X R . This is a 
constraint satisfaction problem in the scope of IPFP. However, 
it would be very expensive in each iteration of (3) to compute 
the joint distribution Q, k) (X) over all the variables and then 

decompose it into CPTs at the end. We provide a new 
algorithm (called Decomposed-IPFP or D-IPFP for short) to 
overcome this problem by utilizing the chain rules of BN [1], 
Let P init (X)=Yl XjBX P init (X i \tr i ) be the initial distribution 

of the translated BN where CPTs for control nodes in X c are 
set properly as in Subsection A and CPTs for regular nodes in 
X R are set to some arbitrary values that are consistent with 
the semantics of the subclass relation between parent and child 
nodes. Let {R(Vj I L,)} be the set of m given prior ( L t =0) or 
conditional ( n w , 3 L i ^ 0 ) probability distributions asso¬ 
ciated with Vj g X R . The basic idea of our approach is: in 
each iteration step k , instead of computing a new joint 
probability distribution Q (k] (X) over all the variables on one 

constraint in {R(V, I L ; )}, we compute a new CPT Q lkl (V l I n v ) 
for node Vj over that constraint. The iteration process loops 
continuously over all R(V : I L,) until Q converges. D-IPFP is 
given below: 


Q(0) ~ fnit(X) - riv.sX Pinit(X, I Jt f) 


Q w (V i \x Vi ) = Q ik _ 1) (y i \x Vi )- 


R(V i \L i ) 


Q( k -1) (Vi I L,, CT ) 




(4) 


where 




1 


I _ (.Qa-v (V I tt Vl ) • R(Vt I Li)IQa-t , (Vj- I L i . CT)) 


is the normalization factor for each possible value assignment 
of n v , . 

To guarantee the dominance of Q (0) , we define 


Q (k) (Vj \ 7i v ) = 0 if g(A--i)(V,- I Lj-CT) = 0 . It can be shown that 

(Subsection D), if the ontology definition is consistent, given 
an consistent and complete input set {RiXf \ L t )\ , Q 
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converges to Q* with Q(X K I CT ) an Ii-projection of 
P miPX R \CT) over i R(V i I L ,)i (i- e - Q*(X r \CT) has 
minimum Kullback-Leibler distance to P jnit (X R \CT) and 

VV i eX R :Q*<y i \L i ,CT) = R<y l \L i )). 


D. Convergence Proof of D-IPFP 

From previous subsections we have the set of all 
variables X = X R u X c with X R n X c = 0 and X R * 0 , 
where X R = {V l ,...,V s } denotes the set of binary (i.e. 
Vj e{v ; ,v,}) regular nodes, X c = { IjB t } denotes the set 
of binary (i.e. B, e {,£>, }) control nodes (if X c ^0). 

Probability constraints can be put in a general form of 
R(Vj I Lj ) where L ; c n v .If 4 = 0 , then the constraint is a 

prior or marginal, otherwise, a conditional (given some or all 
parents of V i ). 

By the chain rule of BN [1], the probability distribution of 
X R = {Vj I j = l,...,s} in the subspace of CT is: 

Q a) (X K \CT) (5) 

= Q (k) (X R ,CT)/Q m {CT) 


= Q (k) (X R \{V l },V„CT)/Q {k) (CT) 

= Q(k) (V,- I n v ) ■ O Q ik) (bj \ n B ) Q (k) (V: \ n v )/Q ik) (CT) 


X j^X R,j&i 


From (4) we have: 


2(A) (Vi I n V t ) - 01 k-1 0*V ( ) ' 2(A-1) (V,- I n v i )' 


R(V,\L,) 


( 6 ) 


Q a _ l) (V i \L i ,CT) 
Substitute (6) into (5), also note that only one table, namely 
2(A) (Vi I tty.), is changed at iteration k , then 
Q m (X R lCT) 

RiVfLf 


-- [a t _i (7i y.) • <2 (a-d (V) I )' 


= [«*-! (*V,)-G(*-1)W l;r V,)’ 


2 ( a-i) (V; I 4 , CT) 
)(Yj I %.) 

2<a-i) (V) I 4 , CT) 


n G (t) (^iJt„) n e ( A)(v,itty.)/G ( A)(cr) 

XjGX R J±i 


Ft fi(A-i) (4 I 71 b ,) n 2(t-n 04 I Tty ) / 2(*> (CT) 


X; gX^, j&i 

m \l,) 


, , ^ , v 2 ( a-i)(CT) 

= «a-i(D )-2n-n U, I CD- 

‘ Gca-dCKI^.CT) (fcl) 2 ( a) (CT) 

R(V: I L,) 

=A-iK)~— j,; — r e H (^ ict) 


2 ( a-i)(F,. I 4 , CT) 


where /? t _j(;r Vi ) = «a_ 1 (^ i ) 


2,a-d(CT) 

2(A) (CT) 


(7) 


Now we show that g (it) converges to a limit probability 


distribution <2 and Q fulfills all the given constraints in the 
subspace of CT , i.e. 

VV; Q*(Vj I Lj,CT) = R(Vj I Lj), i.e. 

W ; lim^ 2(A) (V, I 4-, CT) = R(V, I 4) (8) 

First, we prove that in each iteration step of (4), 


Q(k)(X R \CT) is an Ii-projection of Q( k -i)(.X R I CT) over 

some constraint in the subspace of CT . Because our rule (4) is 
for local updates (change CPTs, not the joint distribution of 
X R ), and because the CPTs are given for the general space 
but constraints are in the subspace of CT , Ii-projection 
generated at each iteration does not necessarily meet the given 
constraint R(\t I L,) • However, we can show that Q (k) (X R I CT) 

is an Ii-projection of Q (k \ ) (X R I CT) over another constraint 
derived from R(Vj I L,) in the subspace of CT . 

Let Lj'= 7T V , \Lj (or n v is partitioned into and Lf), we 
define a new constraint R\ k) (V, I n v ., CT): 

R\k) (F ( I4,4',CT) 


~ Pk -1 2 d )' 


m i4) 


■Q^iVi 14 , 4 \CT) 


(9) 


Q(k-l) (Yi I 4 > CT) 

To prove that Q (k) (X R \CT) is an Ii-projection of 


Q(k-i)(X R \CT) over R' (k) (V,- I 7t Vj , CT) in the subspace of 
CT , from (7) and (9) we have: 

QaPX R \CT) 


~ Pk -1 V, ) ' 


W4) 


G ( *-i)(Vi I 4 .CT) 


■e ( A-i)(^icD 


Gk-dCF; I4.4’.CT) 

G(A-1)(F, IL,,L,’,CD 


R\ k) (V i \L i ,L i ',CT) 

= Q(k-1)(X R I CT)-- L —!- 

( ^ 2(A-1)(F,IA,C,CCD 

B' m (v; Itfy.CT) 

= 2(A-1)(^ICD—^— - ( 10 > 

‘ ' 2 ( a-1)(F; \x v .,CT) 

Then from (3), Q {k) (X R \CT) is an Ii-projection of 
2(A_t) (X r I CT) over constraint R' (k) (Vj I ji v ,CT) in the 
subspace of CT , and thus 

2(A) (V,- I tr Vj ,CT) = R (j.) (VJ- I 7T v ,CT) (11) 

Second, since each iteration is an Ii-projection, we can 
show (analogous to the convergence proof in [6] (Page 22)) 
that: 

I(Q (k) (X R I CT) II 2 ( a-i)(*« I CT)) -> 0 (12) 

and since all the random variables are finite, based on 
Theorem 2.4 of J. Vomlel’s thesis [6] (Page 20) and (12), the 
sequence 2(0)’2(i) 2(a-1)’2(A)> ••• converges to some limit 

probability distribution (denote it Q ) and when k —> oo, we 
obtain: 

Q a) (X R \CT)^Q a _ 1) (X R \CT) (13) 

Finally, we show that this Q fulfills all given constraints, 
using (13) together with (7), we have: 

R(V,\L,) 


Pk -i ( n v ,)' 


-41 


(14) 


2(A-1) (Vj I Lj, CT) 

When k —> oo, we also have Q (k fCT) —> Q (k ._ tj (CT ), so: 

Pk-P K vP ^ a k-P K Vi) (15) 

From (14) and (15), we have: 
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R(V,\Li) 


-> 1 , i.e. 


OCi _i (7Tu ) ' 

' Q^iV^L'XT) 

Q(k-i) (V,- I L t ,CT) —> a k _ x (k Vi ) ■ K(V, I L ,) 

Since both Q^-i) anc * ^(V; I A ) are probability distributions, 
then the normalization factor a k _ x (n v ) —> 1 , then we have: 
\im k ^Q {k) (V i \L i ,CT)=R(V i \L i ) 


E. An Example 

We demonstrate the validity of our approach by a simple 
example ontology. In this ontology, "Animal" is a primitive 
concept class; "Male", "Female", "Human" are subclasses of 
"Animal"; "Male" and "Female" are disjoint with each other; 
"Man" is the intersection of "Male" and "Human"; "Woman" 
is the intersection of "Female" and "Human"; "Human" is the 
union of "Man" and "Woman". 

The following constraints or probabilities are attached to 
X R = {Animal, Male, Female, Human, Man, Woman}: 

(1) P(Animal) = 0.5; 

(2) P(MalelAnimal) = 0.5; 

(3) P(FemalelAnimal) = 0.48; 

(4) P(HumanlAnimal) = 0.1; 

(5) P(ManlHuman) = 0.49; 

(6) P(WomanlHuman) = 0.51. 

We obtained the BN by first constructing the DAG (as 
described by Section III), then the CPT for nodes in X c (as 
described in Subsection IV.A), and finally approximating the 
CPTs of nodes in X R by running D-IPFP. Fig.5 below shows 
the BN we obtained. It can be seen that, when all control nodes 
are set to True, the conditional probability of “Male”, 
“Female”, and “Human”, given “Animal”, are 0.5, 0.48, and 
0 .1, respectively, the same as the given probability constraints. 
All other constraints, which are not shown in the figure due to 
space limitation, are also satisfied. 



The initial CPTs (of nodes in X R ) used in this example and 
the final solution CPTs (of nodes in X R ) obtained by D-IPFP 
are listed in Table 6. Note that in all initial CPT, values on the 
first row were set to 0.5. They can be set to any arbitrary 
values greater than 0 and less than 1. Values for all other rows 


were set according to the subclass relation. It can be seen that 
the values on the first row in all CPT have been changed from 
their initial values. 


Table 6 - CPT of the Example Ontology 


Animal 

True 

False 

0.5 

0.5 


Animal 

Male 

True 

False 

True 

0.5 

0.5 

False 

0 

1 


Animal 

Female 

True 

False 

True 

0.5 

0.5 

False 

0 

1 


Animal 

Human 

True 

False 

True 

0.5 

0.5 

False 

0 

1 


Male 

Human 

Man 

True 

False 

True 

True 

0.5 

0.5 

True 

False 

0 

1 

False 

True 

0 

1 

False 

False 

0 

1 


Female 

Human 

Woman 

True 

False 

True 

True 

0.5 

0.5 

True 

False 

0 

1 

False 

True 

0 

1 

False 

False 

0 

1 


Animal 

True 

False 

0.92752 

0.07248 


Animal 

Male 

True 

False 

True 

0.95677 

0.04323 

False 

0 

1 


Animal 

Female 

True 

False 

True 

0.95469 

0.04531 

False 

0 

1 


Animal 

Human 

True 

False 

True 

0.18773 

0.81227 

False 

0 

1 


Male 

Human 

Man 

True 

False 

True 

True 

0.47049 

0.52951 

True 

False 

0 

1 

False 

True 

0 

1 

False 

False 

0 

1 


Female 

Human 

Woman 

True 

False 

True 

True 

0.51433 

0.48 5 67 

True 

False 

0 

1 

False 

True 

0 

1 

False 

False 

0 

1 


Initial aibiTraiy CPT 


Final CPT obtained by D-IPFP 


F. Discussion over D-IPFP 

Some other general optimization methods such as simulated 
annealing (SA) and genetic algorithm (GA) can also be used to 
construct CPTs of the regular nodes in the translated BN. 
However, they are much more expensive and the quality of 
results is often not guaranteed. In our experiments, D-IPFP 
converges quickly (in seconds, most of the time in less than 30 
iterative steps), despite its exponential time complexity in 
theoretical analysis. The space complexity of D-IPFP is trivial 
since each time we only manipulate the CPT of one node, not 
the entire joint probability table. 

However some theoretical issues regarding D-IPFP remain 
to be addressed, including the existence and uniqueness of the 
solution and the impact of the input constraint set on the 
quality of the solution: 

(1) Existence: Under what condition will the input constraint 
set specify a multivariate joint distribution? 

(2) Uniqueness: Assume such joint distribution exists, will it 
be unique? 

(3) Quality of input set: How to deal with weakly consistent, 
inconsistent or incomplete input set? 

Future work also includes extending D-IPFP to handle an 
input set with constraints of more general form, such 
as: {P(A I fi)} , where A,B c: X R = {Vj,...,V S } , AnB=0. 
This might be possible since according to the chain rule, 
P(Vj I B) can be transformed into a set of constraints 

with the form of P(V i 1C), Cc {Vj,...,V s } \ {V,}, i.e. 
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p<y h \B,v i2 ,...,v h ) 

PIV^B^,...^) 

PiV^ I B,V h ) 

P(y„ I B) 

In our experiments, we also notice that the order to apply 
the constraints will not affect the solution, and the values of 
the initial distribution g (0) (X) = P init (X) (but avoid 0 and 1) 

will not affect the solution either. 

V. Reasoning 

The probabilistic-extended ontology can supports common 
ontology-related reasoning tasks in the subspace of CT . Here 
we outline how three such tasks can be done in principle. 
Detailed algorithms are under development. 

A. Concept Satisfiability 

Given a concept represented by a description e , decide 
whether P(e I CT) = 0 (False). P(e I CT) can be computed by 
applying the chain rule of BN. 

B. Concept Overlapping 

The degree of the overlap or inclusion between a concept C 
and a description e can be measured by P(c I e, CT) , which 
can be computed by applying general BN belief update 
algorithms (c means the “True” state of C ). 

C. Concept Subsumption 

Find the most similar concept C that a given description e 
belongs to. This task cannot be done by simply computing the 
posterior probability P(C\e,CT) , because any class node 
would have higher probability (prior or posterior) than its 
children, and the root node always has the probability of 1. 
Instead, we define a similarity measure MSC(e, C) between e 
and C based on Jaccard Coefficient [16]: 

MSC(e , C) = P(enC I CT) / P(e u C I CT) 

= P(e,c I CT)/(P(e I CT) + P(c I CT) - P(e,c I CT)) (16) 
This measure is an intuitive and easy-to-compute measure, 
and when e is a subclass of C (i.e., P(c I e, CT) = 1 ), it 
reduces to the Most-Specific-Subsumer of DL. Otherwise, C is 
a class that has the largest overlap with e. We are also looking 
at other similarity measures, such as those based on entropy or 
mutual information. 

In our example ontology (see Fig.5), to find the concept that 
is most similar to the given description e = —.Man n Animal , 
we compute the similarity measure of e and each of the nodes 
in X R = [Animal, Male, Female, Human, Man, Woman} 
using (16): 

MSC(e , Animal) = 0.4755 , 

MSC(e,Male) = 0.4506 , 

MSC(e , Female) = 0.5047 , 


MSC(e, Human) = 0.0510 , 

MSC(e,Man) = 0.0, 

MSC(e, Woman) = 0.0536 . 

This leads us to conclude that class “Female” is the most 
similar concept to e , since it has the highest similarity 
measure among all nodes in this particular example. 

VI. Conclusion, Related Work and Discussion 

In this paper we present our ongoing research on 
probabilistic extension to OWL. We have defined new OWL 
classes (“PriorProb”, “CondProb”, and “Variable”), which can 
be used to markup probabilities for classes in OWL files. We 
have also defined a set of rules for translating OWL ontology 
taxonomy into Bayesian network DAG and provided a new 
algorithm D-IPFP to construct CPTs for all the regular nodes. 

Our probabilistic extension to OWL is compatible with 
OWL semantics, and the translated BN is associated with a 
joint probability distribution over the application domain 
consistent with given probabilities. We are currently actively 
working on extending the translation to include properties, 
developing algorithms to support common ontology-related 
reasoning tasks, and formalizing mapping between two 
ontologies as probabilistic reasoning across two translated BN. 
Based on successful resolution of these issues and other 
refinement of our framework, we plan to implement a 
prototype which can automatically translate a given OWL 
ontology with uncertainty information into a BN and can also 
support common ontology-based reasoning tasks. 

Researchers in the past have attempted to apply different 
formalisms such as fuzzy logic, rough set theory, and Bayesian 
probability as well as ad hoc heuristics into ontology definition 
and reasoning (see [10] for a brief survey). Works that 
integrate probabilities into description logic based systems 
(e.g., [9, 11, 12, 13, 14] are particularly relevant to our work. 
Works in [12, 13] provide a probabilistic extension of the DL 
ALC based on probabilistic logics. P-CLASSIC [14] gives an 
informal probabilistic extension to CLASSIC also based on 
Bayesian networks, in which each probabilistic component is 
associated with a set of p-classes, each of which is represented 
using a BN. P-SHOQ(D) [11] is the probabilistic extension of 
DL SHOQ(D) [15] based on the notion of probabilistic 
lexicographic entailment from probabilistic default reasoning. 
Among these works, only P-SHOQ(D) is able to represent 
assertional (i.e., Abox) probabilistic knowledge about concept 
and role instances. The primary difference between [9] and our 
work is that their links are pointed from subconcepts to 
superconcepts, which makes the construction of CPTs 
difficult. Our method are not aimed at providing additional 
means to represent uncertainty or probabilistic aspect of the 
domain but rather at developing formal rules to directly 
translate an OWL ontology into a Bayesian network. 
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